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[57] ABSTRACT 

Index keys from JPEG encoded still images are extracted 
based on the gray-scale, luminance, and/or chrominance 
values. The index keys are stored in a database, along with 
corresponding location, and size information. The index key 
of a query image, also encoded in the JPEG format, is 
extracted. The index key of the image query is compared 
with the index keys stored in the meta database, with still 
images having index keys similar to the index key of the 
query image identified. The still images are then retrieved 
and displayed by selection of a user. 
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USING INDEX KEYS EXTRACTED FROM 
JPEG-COMPRESSED IMAGES FOR IMAGE 
RETRIEVAL 

BACKGROUND OF THE INVENTION 

1. Field of Ihe Invention 

TtLt present invention relates to storing and retrieving still 
images from large digital still image archives when the still 
images are encoded using the JPEG (Joint Photographic 
Experts Group) encoding standard, and specifically to the 
extraction, based on the coefiBcienls of the Discrete Cosine 
Transforms, of still image index keys from JPEG encoded 
still images, and to the search for and retrieval of still images 
based on the extracted index keys. 

2. Description of the Related Art 
Managing large collections of images was once only a 

problem for speciaUsts in fields such as remote sensing, 
intelligence-gathering, and medical imaging. With the 
growth of multimedia computing and the spread of the 
INTERNET, an increasing number of people can display and 
manipulate images, and use them for a variety of applica- 
tions. 

JPEG is an encoding standard used for digitally encoding, 
typically with a computer, still images for use in the infor- 
mation processing industry. With the JPEG encoding 
standard, still images can be stored on CD-ROMs, magnetic 
storage such as hard drives, diskettes, and tape. Further, the 
JPEG encoding standard allows still images to be transmit- 
ted through computer networks such as ISDN, wide area 
networks, local area networks, the INTERNET, the INTRA- 
NET and other communication channels. 

The JPEG standard for compressing images is mostly 
used as a lossy compression scheme. JPEG can also be 
configured as a lossless method. The JPEG compression 
standard is described in "The JPEG Still Picture Compres- 
sion Standard", by Gregory T. Wallace, Communications of 
the ACM, April 1991, vol. 34, No. 4, pp. 31-44, incorporated 
by reference herein. 

JPEG uses a combination of spatial-domain and 
frequency-domain coding. For grayscale images, the image 
is first divided into 8x8 pixel blocks, each of which is 
transformed into the frequency domain using the Discrete 
Cosine Transform (DCT). Each block of the image is thus 
represented by 64 frequency components. The signal carry- 
ing the JPEG-encoded image tends to concentrate in lower 
spatial frequencies, enabling high-frequency components 
(many of which are usually zero) to be discarded without 
substantially affecting appearance of the image. 

A main source of loss of information in JPEG-encoded 
images is a quantization of the DCT coefficients. A table of 
quantization coefficients is used, one per coefficient, usually 
related to human perception of the different frequencies. The 
quantized coefficients are ordered in a "zig-zag" sequence 
by the JPEG compression scheme, starting at the upper left 
(which is the DC coefficient), and scanning the matrix of 
coefficients diagonally, since most of the energy lies in the 
first few coefficients. As a result, most non-zero values 
appear early in the sequence of coefficient values. 

The final step is entropy coding of the coefficients, using 
either Huffman coding or arithmetic coding. 

With the JPEG encoding standard, color image compres- 
sion can be approximated by compression of multiple gray- 
scale images. In the JPEG encoding standard, color repre- 
sentation is YCrCb, a color scheme in which the luminance 
component and chrominance components are separated, Y is 
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a luminance component of color, and CrCb are two com- 
ponents of chrominance of color. For each four pixels of 
luminance, one pixel of Cr and one pixel of Cb is present. 
In the JPEG encoding standard, the chrominance informa- 
5 tion is subsampled at one-half of a luminance rate in both the 
horizontal and vertical directions, giving one value of Cr and 
one value of Cb for each 2x2 block of luminance pixels. 
Chrominance and luminance pixels are organized into 8x8 
pixel blocks (or blocks). Pixel blocks are transformed into 
10 the frequency domain using the DCT operation, resulting in 
DC and AC components corresponding to the pixel blocks. 

A macro block comprises four 8x8 blocks of luminance 
pixels and one 8x8 block for each of two chrominance 
(chroma) components. Therefore, a macroblock comprises 
15 the DCT coefficients for four 8x8 blocks of luminance pixels 
and one 8x8 block for each of two chrominance coefficient 
pixels. 

A problem in the related art is that most image databases 
are not indexed in useful ways, and many are not indexed at 
20 all. Creating an index of images, then, is a formidable task 
using technology of the related art. 

SUMMARY OF THE INVENTION 
An object of the present invention is to index image 
25 databases by creating an index of the images stored in the 
image databases. 

Another object of the present invention is to retrieve 
images from a large database using an image as a query. 
A further object of the present invention is to create an 
30 index that allows retrieval of images similar to a given query 
image. 

Another object of the present invention is to extract index 
keys from images. 
A further object of the present invention is to identify and 
^5 retrieve images that are of the same type (i.e., given an 
image of a person's face, then other images of people's faces 
are retrieved; while given an image of a document, then 
other documents are retrieved). 

Still another object of the present invention is to extract 
index keys from still images digitally-encoded, using the 
JPEG encoding standard, based on the components of the 
DCT coefficients. 

A further object of the present invention is to compare 
JPEG images to a given query image relatively quickly, 
without having to fully decompress the images. 

An additional object of the present invention is to retrieve 
from local or remote databases still images similar to a query 
still image using index keys of the database still images and 
the query still image. 

A further object of the present invention is to archive 
index keys of JPEG compressed images. 

Still another object of the present invention is to 
re-encode still images previously encoded using encoding 
55 standards other than the JPEG encoding format, and extract 
and store index keys therefrom. 

A further object of the present invention is a retrieval 
method that lakes a still image as a query and searches a 
database for still images with similar content, facilitating 
go retrieval of still images from large databases. 

An additional object of the present invention is to filter 
images in a large database, and classify the images accord- 
ing to a measure of the differences between each of the 
images and a given image. 
65 An object further still in the present invention is to create 
image index keys in the JPEG compressed domain and 
without reconstructing the image arrays. 
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In accordance with the present invention, in still images BRIEF DESCRIPTION OF THE DRAWINGS 

encoded using the JPEG encoding standard, DCT compo- ^^^^^^ ^^^^^ ^^^^^ invention are explained 

nents are determined conventionally. The present mvention ^ ^.^^^ ^^^^^^^^ drawings, in which: 

IS appbcable to extractmg index keys from grayscale a i-^c • c - a i 

images. In addition, the preient invention is also appHcable 5 I^G. 1 is a flow chart of an overview of mdex key 

to extracting index keys based on color images, which color extraction from still images, in the present invention; 

information is determined conventionally. FIG. 2 is a flow chart of an overview of retrieval of images 

The present invention is a computer-implemented process similar to the query image, in the present invention; 

which takes advantage of JPEG-encoded still images to FIG. 3 is a block diagram of the system architecture for 

decrease the amount of data that must be processed and to a computer-based implementation of the present invention; 

provide the basis for index keys used for retrieval of similar 4 ^ ^^^^ describing selecting windows within 

images. In the present invention, an index key or a set of ^ ^^^j ^ ^ invention; 

index keys are constructed for each unage. ITie mdex keys ? u-.f 11. • i 1 -.u 

can be pre-computed for an image database, or can be FIG. 5 is a flow chart of calculalmg mdex keys, m the 

computed for each image during the retrieval. Pre- present invention; 

computing is preferable to increase performance, but is at FIG, 6 is an example showing selection of window pairs 

times not possible as in the case of images being retrieved and calculation of index keys within one image, in the 

over the INTERNET. In accordance with the present present invention; 

invention, an index key comprises coeiEcient keys. Each pjc. 7 shows the relationship between an image, a 

coefficient key corresponds to one of the 64 DCT coefiB- window, and a block, in accordance with the present inven- 

cients resulting from J PEG-e needing of the image and 20 j.^^, 

comprises a predetermined, fixed number of bits selected by pj^ ^ ^ ^ ^^^^^ organization of coefficient 

^ . . „ , , , keys within an index key in the present invention; 

In the present mvention, still images are first encoded ^ ^ c ^a- 

using the JPEG encoding standard. Then, a number of bits FIG. 9 is a flow chart shovvmg calculation of differences 

to be used for the coefficient key within the index key is 25 between an mdex key for a query image and mdex keys for 

chosen and remains the same for each image. A number of corresponding images searched, m the present invention; 

windows is selected as twice the number of bits in each and 

coefficient key. Tht placement of the windows within each FIG. 10 is a display showing a demonstration of a query 

still image is identified, and remains the same for each image and images retrieved by the present invention and 

image. Each window within the still image is then paired 30 displayed on a user interface, 
with another window within the still image. For each pair of 
windows and each DCT coefficient, one bit in the index key 
is allocated. Next, for each window in the still image, a 

vector of features is calculated, based upon the components There are two main aspects of the present invention, 

of the DCT coefficients of the JPEG-encoded still image. 35 explained below: 

Lastly, the bits of the index key for the still image are , (1) index key extraction and archival, and 

calculated, based upon a difference in values between the (2) retrieval of still images using index keys, 

vector of features in one window in each window pair and Index key extraction and archival refers to extracting 

the other, corresponding window in each window pair. index keys of still images and storing the extracted index 

In accordance with the present invention, an index key is 40 keys of the still images in a database. Retrieval refers to 

computed for a query image (an image identified by the user extracting the index key from a query image (which query 

for use in retrieving other, similar images), and an index key image is one designated by the user for which other, similar 

is computed for each, corresponding image being searched. still images are to be identified), then comparing that query 

Then, and also in accordance with the present invention, the image index key against the index keys representing the 

index key corresponding to the query image is compared 45 images being searched, which index keys are stored in the 

with each index key of the corresponding images being database. 

searched. For each index key being compared with the index An index key can be extracted, in the present invention, 

key corresponding to the query image, a measure of the from a stfll image partially encoded using the JPEG encod- 

differences between the index key corresponding to the ing standard. A stiU image at least partially encoded for 

query image and each index key being searched is calcu- 50 index key extraction in the present invention is encoded 

lated. Each measure of differences is ordered with respect to (using the JPEG standard) at least to the level of having DCT 

other measures of differences or is compared against a coefficients, without arithmetic coding or Huffman coding, 

user-determined threshold level previously selected. For In addition, an index key can be extracted from a still image 

each index key of the images searched with "lower" mea- at least partially decoded (Huffman or arithmetic decoded), 

sures of differences between (which are the index keys most 55 and having OCT coefficients, in accordance with the present 

similar to) the index key of the query image, the correspond- invention. The method of index key extraction in accordance 

ing image is retrieved and displayed to the user. ITie "lower" with the present invention is preferably applied to a partially 

measures of differences between the index key of the query JPEG-encoded image, which is Huffman (or arithmetic) 

image and index keys of each, corresponding image being decoded. Further, an image which has not been JPEG- 

searched are relative to the other index keys being compared 60 encoded or has been digitally encoded using a standard other 

to the index key of the query image. than JPEG encoding, or has been Huffinan or arithmetic 

These together with other objects and advantages which encoded, is preprocessed by the present invention to place 

will be subsequenUy apparent, reside in the details of the the image into the above-mentioned partially JPEG- 

construction and operation as more fully herein after encoded, Huffman (or arithmetic) decoded, format for index 

described and claimed, reference being had to the accom- 65 key extraction. 

panying drawings forming a part hereof, wherein like FIG. 1 is a flow chart of an overview of index key 

numerals referred to like parts throughout. extraction from stiU images in the present invention. Refer- 
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ring now to FIG. 1, in step 100, a user first identifies an database, in which case the database is searched for simi- 

image or a database of images for which corresponding larity to the index key extracted from the query image. Also 

index keys are to be extracted. As shown in step 102, a query as explained above, each image may have multiple index 

is first made to determine whether the image is digitally keys based upon different components, or different combi- 

encoded using the JPEG encoding standard. If not, then in 5 nations of components, of the OCT coeflScients for that 

step 104, the image is digitally encoded, to the stage of image. However, for meaningful image retrieval using index 

deriving the DCT coefficients, using the JPEG encoding keys, then comparison between index keys extracted using 

standard either by encoding the image for the first time or by the same components of the DCT coeflBcients should be 

decoding the image encoded using a conventional encoding made. 

standard different firom the JPEG encoding standard, then lo As shown in step 208, the index key extracted firom the 
JPEG re-encoding the image. If the image is digitally query image is compared with each index key extracted 
encoded using the JPEG encoding standard, then in step 106, from the images being searched. A difference measure for 
a query is made lo determine whether the image is encoded each comparison is recorded. The difference measure is 
using Huffman or arithmetic encoding. If so, in step 108, the recorded as a difference between the index key extracted 
image is partially decoded from Huffman or arithmetic 15 firom the query image and the index key extracted from the 
encoding. search image. Therefore, the lower the number for the 
At the begirming of step 110, then, the image is digitally difference measure, the more similar to each other are the 
encoded iising the JPEG encoding standard, without being two images corresponding to the index keys being corn- 
further encoded using Huffman or arithmetic encoding. As pared. 

shown in step 110, the image index key for the image is 20 Next, as shown in step 210, images for which the corre- 
extracted. To extract the index key, DC components of the sponding difference is less than a user-determined threshold 
DCT coefficients may be determined and compared for each (i.e., which are most similar to the query image) are 
selected window pair within the image (which will be retrieved and displayed to the user, 
explained in more detail with respect to FIG. 6). In the In a preferred embodiment of the present invention, an 
example shown in FIG. 1, DC components of the DCT 25 index key extraction and storage system 10 is a computer- 
coefficients are used to extract image index keys. However, based system, as shown FIG. 3. The computer implementing 
components of the DCT coefficients other than the DC the index key and storage system 10 could be, for example, 
components may be used (or may be used in conjunction a SUN™ workstation or a Pentium™ -based personal com- 
with the DC components) to extract image index keys. In puter. Index key extraction and storage system 10 includes 
extracting multiple image index keys for one image, all 30 an image information retrieval system 12 and an image 
components of the DCT coefficients may be used. source 22. In a preferred embodiment of the present 

l^e index key is then stored, as shown in step 112, in an invention, as shown in FIG. 3, a user interface 14, an index 
index key database allocated for the type of index key key extraction and archival process 16, and a retrieval 
extracted (i.e., in the example of FIG. 1, index keys are subsystem 18 are software programs running on the corn- 
extracted using DC components of the DCT coefScients). 35 puter implementing index key extraction and storage system 

If at least a single corresponding index key has not been 10. Meta database 20 is a database storing extracted index 

extracted at the point of step 114 for each image, steps keys. 

100-114 are repeated until at least one corresponding index Also part of the index key extraction and storage system 

key has been extracted for each image. 10 is the image source 22. Image source 22 provides still 

In accordance with the present invention, each image may 40 images to the image information retrieval system 12 from 

have more than one index key, due to factors such as a three sources: a JPEG server 24, a live image source 26, and 

difference in window placement within the image or the a network source 28. Other sources of images, of course, are 

number of bits used in forming the query image, and other possible. The still images could be already digitally encoded 

factors. However, to have a meaningful comparison between in the JPEG format and provided by JPEG server 24. 

index keys, then the above-mentioned factors should remain 45 Further, the still images may be provided from live images, 

constant for each image being compared. which is provided by live image source 26, or encoded in the 

FIG. 2 is a flow chart of an overview of the retrieval of JPEG format or in a format other than JPEG, which arc 

images similar to the query image in the present invention. provided by network source 28. 

As shown in FIG. 2, a query image, selected by the user, is Each of .JPEG server 24, live image source 26, and 

identified in step 200 as the image for which similar images so network source 28 may interface to other, respective sources 

are to be identified and retrieved. In step 202, the index key providing still images thereto and which are not shown in 

is obtained from the query image; the index key could have FIG. 3. A representative source to which JPEG server 24 

been previously extracted and stored in a database or could interfaces is the INTERNET, which may store still images at 

be extracted from the query image in step 202. remote sites on computers running the UNIX™ operating 

As shown in step 204, images to be searched and com- 55 system, 

pared to the query image are identified by the user. The Image source 22 provides to the image information 

images to be searched could be identified, for example, by retrieval system 12, still images from each of the foregoing 

identffying a database storing images, by identifying an sources. If still images are provided to the image information 

image input source inputting images or by identifying an retrieval system 12 from JPEG server 24, no further encod- 

INTERNET Web site. Without departing from the present 60 ing is necessary before extraction of the index keys, 

invention, the user could also identify individual images for However, Huffman-decoding or arithmetic-decoding may be 

which difference measures are to be calculated. necessary, which is accomplished by the present invention. 

In step 206, an index key corresponding lo each image On the other hand, if live image source 26 provides live 

being searched is extracted. If 100 images are to be searched, images to image information retrieval system 12, the live 

100 total index keys are extracted, with one index key 65 images must be compressed by the present invention using 

corresponding to each image. As explained above, the index the JPEG encoding standard before the index keys are 

keys could have been previously extracted and stored in a extracted. For example, a still image camera or other image 
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source with image capture capability may interface to live simply determined to satisfy or not satisfy threshold criteria 

image source 26. predetermined by the user. 

Likewise, if still images are provided by the network The difference calculation processor 30 determines dif- 

source 28 to the image information retrieval system 12, the ferences between an index key of a query image and index 

still images may have been encoded in a format other than 5 keys of the still images stored in meta database 20. The 

the JPEG format; subsequently, the still images must be determination of differences between still images is 

partially re-encoded by the present invention using the JPEG explained in detail below. 

encoding standard (without being Huffman- or arithmetic- The ordering processor 31 determines an order of the 

encoded) before extraction of the index keys may occur. differences between the index keys extracted from the still 

In the present invention, still images requiring encoding lo images and the index key extracted from the query image, 

into the JPEG format are encoded in a conventional way, The fetch and display processor 32 contains pointers for 

using conventional computer hardware and software. displaying stiU images whose index keys are stored in meta 

After the JPEG -encoded still images are received by the database 20. The stiU images may be stored locally or 

image information retrieval system 12, the index key exlrac- remotely, such as at a remote Web site on the INTERNET 

tion and archival process 16 extracts a corresponding index 15 and archived using the index key extraction and archival 

key from each still image. process 16. In the foregoing example, if the still image is 

In addition, the index key extraction and archival process stored at a remote Web site on the INTERNET, fetch and 

16 stores the extracted index keys in meta database 20. Meta display processor 32 tracks the INTERNET node of the still 

database 20 is referred to as a "meta" database because data image, and location on the remote file system. Conventional 

that describe other data (i.e., index keys of still images and 20 processes are used to fetch the image from the location and 

other descriptive data thereof such as location from which to display it. 

the still image was retrieved, size of the still image, title of In a preferred embodiment of the present invention, each 
the image, etc.) are stored therein. The index key extraction of the difference calculation processor 30, the ordering 
and archival process 16, in a preferred embodiment, is processor 31, and the fetch and display processor 32 is a 
implemented in software on any UNIX™ -based computer, 25 software program, but could also be implemented in hard- 
personal computer, or other platform, and accomplishes all ware or firmware. 

of the above-mentioned encoding/re-encoding/partially User interface 14, in the present invention, is front end 

decoding required to place the images into the JPEG software being executed by a computer, in a preferred 

partially-encoded formal discussed above. The index key embodiment, and written using development kits, such as 

extraction and archival process 16 could also be part of 30 VISUAL C++tm VISUAL BASIC™ in which a user can 

JPEG encoder or JPEG decoder hardware implementation submit a still image as a query image and display search 

board. results. An example of a user interface 14 of the present 

On the other hand, if the index key extraction and archival invention is shown in FIG. 10, and explained with reference 

process 16 receives still images from live image source 26, thereto. 

the index key extraction and archival process 16 compresses 35 In the present invention, the use of the DCT coeflBcients 

or partially compresses in a conventional way the still provided by the digital encoding of the image using the 

images using the JPEG encoding standard, extracts the index JPEG encoding standard to construct index keys is now 

keys from the compressed JPEG still images, and stores the explained. 

extracted index keys in meta database 20. Similarly, if the FIG. 4 is a flow chart describing selection of windows 

network source 28 transmits still images to the index key 40 within a still image in accordance with the present invention, 

extraction and archival process 16, then the index key In the present invention, the DCT coeflBcients provided by 

extraction and archival process 16 re-encodes the still the JPEG-cncoded image are used to construct a correspond- 

images into the JPEG format, extracts the index keys from ing index key. One index key includes a plurality of coef- 

the re-encoded JPEG still images, and stores the index keys ficient keys, with one coeflScient key corresponding to each 

in the meta database 20. 45 DCT coefficient, as explained in the following paragraphs. 

Along with the extracted index keys, the index key Referring to FIG. 4, as shown in step 400, the number of 

extraction and archival process 16 stores in the meta data- bits k is selected for the size of each cocfiBcient key within 

base 20, other identifying information, such as the location the index key. The number of windows, then, is selected as 

at which the corresponding still image is stored, the size of twice (2k) the number of bits k. In a preferred embodiment, 

the still image in bytes, and the title of the still image. 50 8 bits (16 windows) is selected as the size of each coeflBcient 

Referring again to FIG. 3, the image information retrieval key within the index key, although using 16 bits (32 

system 12 uses the index key extracted from a query image windows) gives greater selectivity in retrieving images 

by the present invention to search the meta database 20 for similar to the query image. Whether 8 bits or 16 bits or an 

index keys of still images similar thereto. In addition, the even larger number is selected as the coeflBcient key size, the 

image information retrieval system 12 could also extract 55 same number of bits must be selected for each coeflBcient 

from images provided by image source 22 corresponding key within each index key extracted from images being 

index keys without having first stored the index keys in meta searched and from the query image. As explained in greater 

database 20. Retrieval system 18 includes a difference detail with reference to FIG. 8, if 16 bits are selected for the 

calculation processor 30, an ordering processor 31, and a size of each coeflBcient key, and there are 64 coeflBcient keys 

fetch and display processor 32. In a preferred embodiment 60 (each coeflBcient key corresponding to one of the 64 DCT 

of the present invention, the images corresponding to the coeflBcients for each 8x8 block in the image for gray-scale 

index keys which are "similar" to (i.e., have a relatively images), then the toul maximum size of the index key for 

lower number of differences from) the query image index each image is 128 bytes (2 bytes/coefficient keyx64 coeflB- 

key are ordered based on the differences between their cient keys). 

respective index keys and the query image index key. 65 Selecting a larger number of bits k, resulting in a larger 

Alternatively, the images corresponding to the index keys number of windows 2k, means that fewer "tied" scores of 

which arc "similar** to the query image index key could be differences between the index keys will occur, providing 
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more selectivity in retrieving images similar based on a 
query image in accordance with the second aspect of the 
present invention. Although using 32 windows (16 bits) 
provides greater selectivity during the retrieval, greater 
selectivity does not necessarily provide more candidate 
images which are "similar" to the query image. Therefore, 
selecting a smaller number of windows (16) is better suited 
to the goal of the present invention of extracting similar 
images. If only the DC component is used for image 
retrieval, then selecting 32 windows is preferred. 

As shown in step 402 of FIG. 4, window coordinates are 
selected to "tile", or cover, the image, llie windows may be 
selected to fully cover the image or to be smaller samples of 
data. The shape of the windows may be square, rectangular, 
or some other shape, although a rectangular shape is pre- 
ferred. In the present invention, since the coefficient key size 
is selected to be 16 bits, then the number of windows 
selected is 2x16=32. Accordingly, the number of pairs of 
windows for each image is then 16. 

In step 404, the size of (number of pixels included in) each 
window may be arbitrarily selected, with aU of the windows 
completely covering or not completely covering the image. 
In a preferred embodiment, windows do not overlap each 
other. Since the size of each image may vary, the size of each 
window is selected as a function of, and relative to, the size 
of the image. Each window within an image need not be the 
same size (i.e., the size of the first window need not be equal 
to the size of the 32nd window in each image); however, the 
size of each given window must be the same relative to the 
size of each image (the size of the first window in the first 
image must be equal to the size of the first window in the nth 
image, taking into account the relative difference in sizes 
between the first and nth images). 

For compatibility with JPEG 8x8 blocks, the window is 
clipped in each dimension to the largest multiple of 8 less 
than the selected window size. Since the window size is a 
function of image size, the window size is normalized across 
all images in the method of the present invention. If JPEG 
chrominance and luminance values are also used, then the 
JPEG block size is adjusted and the window is clipped in 
each dimension to the largest multiple of the block size less 
than the selected window size. 

Next, the windows within the image are randomly paired, 
with the constraint that each window has a partner. The 
window pairs must remain constant across all images being 
searched and the query image. In a preferred embodiment of 
the present invention, each window within each pair of 
windows should be selected "far" from the corresponding 
window within the window pair to minimize the possibility 
of each window within the window pair being from the same 
region of the image. For each image, the relationship of the 
DCTcoeiScients between each window in each window pair 
is used to build the coeiEcient keys, which then build the 
index key for the image. If both windows within each 
window pair are from the same region of the image, then the 
comparison between DCT coefficients from each window 
may not provide as meaningful of a result. 

Each DCT coefficient in each window pair is then 
assigned to one of the bits in the index key, with a first bit 
of each coefficient key in the index key corresponding to the 
first window pair, second bit corresponding to the second 
window pair, etc. The value of each bit within the coefficient 
key is computed as explained with reference to FIG. 5. 

As shown in step 5(M) of FIG. 5, in forming an index key 
for an image, a pair of windows within that image is 
selected. Next, in step 502 one bit in each coefficient key in 
the index key for the image is selected for the window pair. 
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In a preferred embodiment, the same bit is allocated to the 
same, respective window pair in each image being com- 
pared. 

In step 504, for each window in the window pair selected, 
vector of feamres" is calculated. The vector of features for 
one of the windows in the window pair comprises the DCT 
coefficients, conventionally computed as explained above, 
for each 8x8 block of the window. More specifically, the 
vector is a list of 64 DCT frequency components/coefficients 
for the block. If more than one 8x8 block exists in the 
window, then an average of each DCT coefficient over all of 
the 8x8 blocks in the window is taken, providing a vector of 
64 feature values. 'ITie "vector of features" could also be any 
other arbitrary function applied to the window that computes 
a value representative of the window. 

Then, for each feature value in the vector of features 
across all window pairs within the image, a coefficient key 
is calculated in step 506 as follows. For each feature value 
(and for each window pair), the value of a featiu"e from the 
first window in the window pair is compared to the value of 
the same, respective feature in the corresponding window in 
the window pair. If the difference in the above-mentioned 
values is greater than a threshold value predetermined by the 
user, then a logical "1" is assigned as the corresponding bit 
in the coefficient key corresponding to the DCT coefficient. 
On the other hand, if the difference in the above-mentioned 
values is less than or equal to the threshold value, a logical 
"0" is assigned as the corresponding bit in the coefficient key 
corresponding to the DCT coefficient. 

In a preferred embodiment of the present invention, the 
need for an explicit threshold value can be avoided by taking 
advantage of the JPEG compression scheme. Since the DCT 
coefficients are already quantized during JPEG compression, 
then if the value of the DCT coefficient from the first 
window in the window pair is greater than the value of the 
same, respective DCT coefficient from the second window 
in the window pair, a logical "1" is assigned to the corre- 
sponding bit (coefficient key) in the index key. Otherwise, a 
logical "0" is assigned to the corresponding bit in the index 
key. However, the differences must be greater than the step 
size used in the quantization of the DCT coefficients by the 
JPEG encoding method. 

In addition in the present invention, the above-mentioned 
vector of features may be something other than the average 
of the DCT components from each 8x8 block in the window. 
For example, the vector of features may be the number of 
edge points in a particular window, the average of gray 
values within the window, or the variance of the gray values 
within the window. Accordingly, different definitions of the 
differences (and "similarity") between images may be used 
in the present invention. Further, in the present invention, 
multiple measures of the differences may be used for robust- 
ness or to capture specific features within the images. 
Therefore, more than one index key may be obtained for an 
image by changing the definition of the vector of features, as 
discussed above. In a preferred embodiment of the present 
invention, though, the DCT coefficients already computed 
by JPEG compression are used to compute the index key for 
each image- 
Referring again to FIG. 5, in step 508, if one bit has not 
been computed for each DCT component in each window 
pair in the image, steps 500-508 are repeated. If all bits in 
the index key have been computed for the image, then in step 
510 the index key is stored as the index to the image. A 
pointer to the image is also stored along with the index key. 
Accordingly, a database of index keys and pointers for each, 
respective image is established. 
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The index keys in the present invention are derived from 
relationships between window pairs, as shown in FIG. 6. For 
each image 34, a number of window pairs is selected, in 
which every window pair corresponds to one part of the 
index key 36. Therefore, the number of window pairs 
determines the length of the index key, as discussed with 
reference to FIG. 4. Each window of a window pair corre- 
sponds to one block of the image or region which covers 
multiple blocks in the image, in the example of FIG. 6. After 
comparing the DCT coeflScients for each window in a given 
window pair, as discussed above, a bit value in coefiBcient 
key kwl corresponding to the first DCT component com- 
parison for the window pair wl-wl' is determined. Then, a 
bit value in the coefEcient key kwl corresponding to the first 
DCT component comparison for the window pair w2-w2' is 
determined, etc. Likewise, kw2 and kw3 are determined, in 
the example of FIG. 6. 

As shown in FIG. 6, image 34 includes three examples of 
window pairs: wl and wl', w2 and w2', and w3 and w3'. The 
positions of window pairs are selected in advance, but are 
fixed for the entire meta database 20 (shown in FIG. 3) of 
index keys and for an index key of an index still image. A 
single still image, accordingly, may have multiple index 
keys stored in multiple meta databases 20. For example, if 
one set of index keys, stored in one meta database 20, 
concentrating on the middle of images is desired, matching 
windows are chosen accordingly. On the other hand, if index 
keys stored in another meta database 20 are meant to 
correspond to background areas, window positions are cho- 
sen accordingly. 

In the present invention, in a preferred embodiment, 16 
window pairs exist for each image. Having 16 window pairs 
is preferred because computers most efficiently store binary 
numbers in groups of 2", and unsigned integers are typically 
stored in 32 bits using many software compilers (for 
example, many compilers of C^" language computer 
software). 

FIG. 7 shows a block in a JPEG window 38 corresponding 
to an 8x8 pixel area. As previously discussed, more than one 
8x8 pixel block may be included in each window. As noted 
previously, in the case of multiple 8x8 blocks in one 
window, the DCT coefficients are averaged to determine the 
values for that window. Also as previously discussed, most 
of the DCT coefiBcients are typically 0, meaning that the 
corresponding coefficient key is 0 and contributes little to the 
retrieval of similar images. Ignoring the zero values can 
result in substantial efficiencies, without necessarily reduc- 
ing the effectiveness of indexing. Accordingly, if only 10 
DCT coefficients are non-zero, then the index key would 
require only 20 bytes to be stored for the index key, with 
additional data being stored as the pointer to the correspond- 
ing image. The same relative bits and/or the same relative 
coefficient keys would need to be compared across all index 
keys to have a meaningful comparison and retrieval. 

Referring now to FIG. 8, an organization of an index key 
for an image is shown. FIG. 8 shows an example of an index 
key corresponding to 16 window pairs. As shown in FIG. 8, 
coefficient key 1 contains bit kwl,l through bit kwl6,l. Bit 
kwl,l is a result of comparison of the first DCT component 
in each window belonging to a first window pair. Bit kwl6,l 
is a result of comparison of the first DCT component in each 
window belonging to a sixteenth window pair. Coefficient 
key 64 contains bit kwl,64 through bit kwl6,64. Bit kwl,64 
is a result of comparison of the 64th DCT component in each 
window belonging to a first window pair. Bit kwl6,64 is a 
result of comparison of the 64th DCT component in each 
window belonging to a sixteenth window pair. 
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A second main aspect of the present invention, retrieval of 
still images using the index keys, will now be described. 

FIG. 9 is a flow chart of the retrieval of images based on 
the index keys. In accordance with the present invention, 
image retrieval is based on differences between index keys 
extracted by the present invention as described above. 

As shown in step 900, a query image (which is an image 
to which similar images are being identified and retrieved) 
is first selected. If the index key from the query image has 
not been extracted, then the query image index key is 
extracted at this point. Also in step 900, a source for the 
images being searched is selected. The source could be a 
database of images, or any one of the image sources 
described with reference to FIG. 3. All index keys are 
extracted based on windows at the same relative positions, 
and scaled based on image size. 

In step 902, a counter i is initialized to 0. Counter i 
indicates the number of the index key being searched. In 
steps 904 and 906, if an index key corresponding to each 
image being searched has not been extracted, then the 
corresponding index key is extracted for each image. 

In step 908, an index key corresponding to an image being 
searched is selected. The index keys need not be searched in 
any particular order because a difference between the query 
image index key and the index key being searched is 
determined based on whether a difference between the index 
keys being compared is less than a threshold. 

Next, in step 910, a variable, difference,, is initialized to 
0. Difference,- stores a total measure of differences between 
the two index keys being compared. 

In step 912, a "degree of match" is calculated between 
each corresponding coefficient key in the index keys being 
compared, with each coefficient key corresponding to one of 
the 64 DCT coefficients. The "degree of match" is deter- 
mined as the sum of all bit positions corresponding to the 
particular DCT coefficient that are different between the 
query image index key and the index key being searched. 

As shown in step 914, the difference,- between the query 
image index key and the index key,- being searched is 
calculated as the total over all of the coefficient keys 
compared of the degree of match between the corresponding 
coefficient keys. Difference,-, then, is the total of the number 
of bit positions in which one of the two index keys being 
compared is storing "1" and the other of the two index key 
being compared is storing "0*'. For the above-mentioned 
calculation, the Hamming distance between the index keys 
being compared is used. For example, the Hamming dis- 
tance measure between 0101 and 1011 is 3 because the 
number of different bits between 0101 and 1011 is 3. The 
difference,, is stored as the difference between the query 
image index key and the index key,-. In the case of the 
present invention, there is one key for each DCT coefficient, 
making 64 coefficient keys per image, with each coefficient 
key having 16 bits. 

The "best" match score (the match score or difference 
score indicating that the index key searched is most similar 
to the query image index key) between index keys compared 
is "0" (all coefficient keys are identical). The "worst" match 
score (the match score or difference score indicating that the 
index key searched is least similar to the query image index 
key) between index keys compared is "1024" (no keys are 
identical). In practice, depending upon the method by which 
the windows are selected and the number of DCT 
coefficients, the match scores are typically in the range of 0 
to 200. 

In steps 916 and 918, if all index keys being searched have 
not been compared to the query image index key, then 
counter i is incremented, and steps 908 to 916 are repeated. 
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Next, in step 920, each diflFerence,- is compared to the 
predetermined threshold, and all images for which differen- 
ce, is less than the threshold are retrieved and presented to 
the user as being "similar" to the query image. Sensitivity of 
the search depends, in pari, upon selection of the threshold 
value. If a larger threshold value, for example 150, is 
selected, then more matches are typically found and more 
images are returned. If a smaller threshold value, for 
example 90, is selected, then less matches are typically 
found, and less images are returned. 

In the present invention, the similarity of the index keys 
compared to the query image index key can also be rank- 
ordered according to their difference scores, llie ordered 
results of the images corresponding to the index keys 
determined to be "similar" to the query image can then be 
presented to the user to browse. 

In the present invention, if the query image is a document, 
then other document images would return low difference 
scores (meaning that the other document images are more 
similar to the query image). If the query image is a passport 
photograph, then other passport photograph images would 
return low difference scores (meaning that the other passport 
photograph images are more similar to the query image). 

The images returned as being similar to the query image 
could then be processed by a more sophisticated and restric- 
tive set of algorithms with the expectation that most of the 
images to which the algorithms are applied will have similar 
characteristics. 

ui FIG. 10 shows an example of an implementation of a user 
interface 14 described herein above with reference to FIG. 
3. A user interface 14 shown in FIG. 10 is implemented 
using a TCL/FK tool kit running on an X WINDOWS™ 
platform. When a user selects "Search by Image" 40, a topic 
such as "NEWS" 42, and a query image 44 (which is shown 
as the upper left most icon in the work spa ce for retrieva l 
areaJ^fi), then the present invention searches extracted index 
keys from a meta database 20 (described herein above with 
reference to FIG. 3) and displays results of that search in the 
workspace for retrieval area 46. Resultant still images are 
ordered according to level of difference from the query 
image, from left to right across rows, then from top to 
bottom of columns. Screen window 48 displays a thumbnail 
(or reduced image) of a still image selected. 

The present invention is not limited to the embodiments, 
described above, but also encompasses variations thereof. 

For example, in the present invention, one DCT coeffi- 
cient (the (0,0) or DC cocfiBcie'nt) may tie used in computing 
difference measures for matching images. Further, ^;hoosing 
three, six, or ten of the DCT coeflBcients (in place of one 
DCT coefficient), which give symmetry about the diagonal 
of the 8x8 block when taken in the JPEG "zig-zag" order, 
yields differing index keys for the same image. In addition, 
as mentioned previously, the windows may be selected in 
differing sizes. 

Search results depend upon window size and the number 
of coefficients used in constmcting the index keys. For 
example, when all 64 coefficients are used, using a 
16-window method produces better results than using the 
32-window method. However, as the size of the database 
increases, the size of the coefficient keys are expected to 
need to be increased. 

In addition, as the size of the windows increases, the 8x8 
blocks within the samples may also be sampled, instead of 
using the average over all of them to determine the DCT 
coefficient for the window. 

The JPEG compression scheme and analogous compres- 
sion schemes are attractive for still image indexing and other 
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image-processing operations because images encoded using 
them are encoded as a combination of spatial components 
and frequency components. The frequency components are 
band-pass filtered, which can be used to approximate image 
operations such as edge detection and texture analysis. The 
spatial coherence of the JPEG scheme enables adjacency 
relations to be retained, which is important for object 
detection and for syntactic methods that rely on relative 
positions or orientations of regions in an image. 

In addition, the method of the present invention can be 
adapted to other compression schemes, such as wavelet- 
based methods. 

The present invention allows retrieval of images given an 
image as a query. ITie present invention capitalizes on 
pre-processing already performed on JPEG-encoded 
images, allowing for images to be compared rapidly and a 
determination to be made with accuracy whether the image 
belongs to a same class as the query image. 

As the use of images becomes more pervasive, and 
especially as more and more use is made of remotely- 
accessed image databases, the approach used by the present 
invention in classifying an image as a member of a class 
substantially reduces the burden of accessing images for 
remote browsing or for further processing. 

The many features and advantages of the invention are 
apparent from the detailed specification and, thus, it is 
intended by the appended claims to cover all such features 
and advantages of the invention which fall within the true 
spirit and scope of the invention. Further, since numerous 
modifications and changes will readily occur to those skilled 
in the art, it is not desired to limit the invention to the exact 
construction and operation illustrated and described, and 
accordingly all suitable modifications and equivalents may 
be resorted to, falling within the scope of the invention. 

What is claimed is: 

I. An apparatus comprising: 

a source providing still images, said source comprising: 
a JPEG server providing the still images encoded using 

a JPEG encoding standard, 
a five image source providing the still images, and 
a network source providing the still images encoded 
using an encoding standard; 

an image information retrieval system, coupled to the 
source, extracting index keys from the still images and 
comparing an index key extracted from a query image 
to the index keys, said image information retrieval 
system comprising: 

an index key extraction and archival section one of 
encoding and partially-re-cncoding using the JPEG 
encoding standard the still images if the still images 
are not encoded using the JPEG encoding standard, 
and extracting the index keys from the still images if 
the still images are encoded using the JPEG encod- 
ing standard, 

a database, coupled to the index key extraction and 
archival section, storing the index keys, along with 
identifying data of the respective stilt images corre- 
sponding to the index keys, and 

a retrieval subsystem, coupled to the database, com- 
paring the index key of a query image with the index 
keys stored in the database, and determining differ- 
ence between the index key of the query image and 
each of the index keys stored in the database, said 
retrieval subsystem comprising: 
a difference calculator calculating the difference, 
an ordering unit, coupled to the difference calculator, 
ordering the index keys based on the difference, 
and 
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a fetch and display unit, coupled to the ordering unit, 
for retrieving still images corresponding to the 
index keys; and 
a user interface, coupled to the image retrieval subsystem, 

displaying the still images. 
2. A method for classifying image data, the method 
comprising executing the following steps in a digital data 
processing device: 

recognizing an even number of windows that are subsets 
of the image; 

applying a mathematical function to image data in each of 
the windows to yield a respective numerical value for 
each of the windows; 



6. The device of claim 5 wherein the medium also 
embodies the image data. 

7. The device of claim 3 wherein the medium also 
embodies the image data. 

8. The device of claim 3 wherein the processing device is 
adapted to accept user queries specifying a desired distance 
between a retrieved image and a query. 

9. The. device of claim 8 wherein the desired distance is 
specified by a Hamming measure. 

10. The method of claim 2 wherein the image is a JPEG 
compressed image and each of the windows is a respective 
block resulting JPEG compression. 

11. The method of claim 2 wherein the mathematical 
function comprises choosing a coefficient of the discrete 



pairwise associating the windows to define at least one 15 cosinetransformof each ofthe respective ones of the blocks, 
pair of windows; ^ - j ^ 1 ...i .1 

comparing the respective numerical values within each of 

the at least one pair to yield a respective comparison 

value for each of the at least one pair; 
forming an index key from the respective comparison 

value; and 

embodying the index key in a medium readable by the 
digital data processing device. 

3. A digital data processing device comprising: 
a processor; 

a device readable medium embodying the index key 
produced according to the method of claim 2. 

4. The digital data processing device of claim 3 wherein 
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12. The method of claim 2 wherein the coefficient is a DC 
coefficient. 

13. The method of claim 2 wherein the mathematical 
function comprises an average of gray values in the window. 

14. The method of claim 2 wherein the mathematical 
function comprises a variance of gray values in the window. 

15. The method of claim 2 wherein the mathematical 
function comprises a number of edge points in the window. 

16. A medium readable by a digital data processing device 
and embodying the index keys created by the method of 
claim 2. 

17. The method of claim 2 wherein the image data 
comprises a stored database archive of images. 

18. ITie method of claim 2 wherein the image data 



the image data is a query image and the processor is adapted 30 comprises a user-supplied query image. 



to retrieve images from a database using the index key as a 
query. 

5, The digital data processing device of claim 3 wherein 
the image data is a data base image and the processor is 
adapted to match the index key to a query for retrieving the 
image data from the database. 



the 



miage 



data 



19. The method of claim 2 wherein 
comprises a stored database image. 

20. The method of claim 2 wherein the processing step 
yields a plurality of index keys. 
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